Tidy Data
“Happy families are all alike; every unhappy family is unhappy in its
own way.” –– Leo Tolstoy
“Tidy datasets are all alike, but every messy dataset is messy in its
own way.” –– Hadley Wickham
- Key ideas:
- Cases = Rows
- Variables = Columns
- How should we define case?
- How do we identify variables?
- Advantages and Disadvantages
Vocabulary
Variable
- In data science, the word variable has a different meaning than in
mathematics.
- In algebra, a variable is an unknown quantity.
- In data, a variable is known; it represents a feature that has been
measured or observed. “Variable” refers to a specific quantity or
quality that can vary from one case to another.
- Types of variables
- quantitative : a number
- categorical (R calls these factors): tells which category or group a
case falls into
- all non-numerical values are categorical, but not all numerical
values are quantitative
- e.g. zip code, IP address, dates
Cases
- Unit of observation or analysis
- this is extremly context specific
What is Tidy Data
- Being neat is not what makes data tidy!
There are three interrelated rules which make a dataset tidy:
- Each variable must have its own column.
- Each observation/case must have its own row.
- Each value must have its own cell.
It is your job as the researcher to define the variables,
observations, and values.
- The “tidyness” of the data set depends on the research question. It
is not an inherent property to the data set itself.
- When data are in tidy form, it’s often straightforward to transform
the data into arrangements that are useful for answering interesting
questions.
Example of Untidy data

Example of Tidy Data

- Disadvantages
- tidy data can be hard for human to quickly interpret
- often not the ideal form for creating graphics
- Advantages
- clear definitions
- tidy data can easily be wrangled to a useful form for
interpretation and visualization
Tidy Data Example
From https://r4ds.had.co.nz/tidy-data.html
You can represent the same underlying data in multiple ways. The
example below shows the same data organised in four different ways. Each
dataset shows the same values of four variables country, year,
population, and cases, but each dataset organises the values in a
different way.
Which ones of these is tidy?
Option 1
library(tidyverse)
table1
Example Continuted
Table 1!

- Note that all tables contain the same information, just represented
differently. Thus, we can transform Tables 2, 3, 4a/4b into Table 1, and
vice versa.
Galton Data
In the 1880s, Francis Galton started to make a mathematical theory of
evolution.
Here’s part of a page from his lab notebook. Discuss the following in
groups:
- What might he investigate with these data (e.g., Research
Question)?
- Are these data tidy according to our
definition?
- What are the cases?
- What are the variables?
- How many rows of data should the result have?
- How many columns of data should the result have?
What is the data type of each column?
- What are some additional variables (not yet shown) that might be of
interest? How would you recommend showing that information in the data
table?
Activity 01: Tidy Data
Work to put these tables in tidy form
- Work with your partner
- As a team, you will put two different data sets into “tidy”
form.
- See Canvas for details
- View-only source data is provided
- use any software you like
- must submit a CSV to Canvas
- do not use spaces in your file names
- Tip: Sketch things out together on paper before you do
anything in the computer
Table 1: Galton’s Height measurements data
Table 2: Presidents

Code Books
What is a code book?
A codebook describes the contents, structure,
and layout of a data collection.
A well-documented codebook contains information intended to be
complete and self-explanatory for each variable in a data file
https://www.icpsr.umich.edu/web/ICPSR/cms/1983
Federal Elections Comission
LS0tCnRpdGxlOiAiTDAyIC0gVGlkeSBEYXRhIgphdXRob3I6IAotICJQcmVzZW50ZXI6IE9saXZpYSBCZWNrIiAKLSAiQ29udGVudCBDcmVkaXQ6IE1hdHRoZXcgQmVja21hbiwgSGFkbGV5IFdpY2toYW0iCmRhdGU6ICJNYXkgMTcsIDIwMjMiCgpvdXRwdXQ6IAogIHNsaWR5X3ByZXNlbnRhdGlvbjogZGVmYXVsdAogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKCi0tLQoKCgoKIyMgVGlkeSBEYXRhCgrigJxIYXBweSBmYW1pbGllcyBhcmUgYWxsIGFsaWtlOyBldmVyeSB1bmhhcHB5IGZhbWlseSBpcyB1bmhhcHB5IGluIGl0cyBvd24gd2F5LuKAnSDigJPigJMgTGVvIFRvbHN0b3kKCuKAnFRpZHkgZGF0YXNldHMgYXJlIGFsbCBhbGlrZSwgYnV0IGV2ZXJ5IG1lc3N5IGRhdGFzZXQgaXMgbWVzc3kgaW4gaXRzIG93biB3YXku4oCdIOKAk+KAkyBIYWRsZXkgV2lja2hhbQoKCi0gS2V5IGlkZWFzOgogIC0gQ2FzZXMgPSBSb3dzCiAgLSBWYXJpYWJsZXMgPSBDb2x1bW5zIAotIEhvdyBzaG91bGQgd2UgZGVmaW5lICoqY2FzZSoqPwotIEhvdyBkbyB3ZSBpZGVudGlmeSAqKnZhcmlhYmxlcyoqPwotIEFkdmFudGFnZXMgYW5kIERpc2FkdmFudGFnZXMgCgojIyBWb2NhYnVsYXJ5IAoKKipWYXJpYWJsZSoqIAoKLSBJbiBkYXRhIHNjaWVuY2UsIHRoZSB3b3JkIHZhcmlhYmxlIGhhcyBhIGRpZmZlcmVudCBtZWFuaW5nIHRoYW4gaW4gbWF0aGVtYXRpY3MuIAogIC0gSW4gYWxnZWJyYSwgYSB2YXJpYWJsZSBpcyBhbiB1bmtub3duIHF1YW50aXR5LiAKICAtIEluIGRhdGEsIGEgdmFyaWFibGUgaXMga25vd247IGl0IHJlcHJlc2VudHMgYSBmZWF0dXJlIHRoYXQgaGFzIGJlZW4gbWVhc3VyZWQgb3Igb2JzZXJ2ZWQuIOKAnFZhcmlhYmxl4oCdIHJlZmVycyB0byBhIHNwZWNpZmljIHF1YW50aXR5IG9yIHF1YWxpdHkgdGhhdCBjYW4gdmFyeSBmcm9tIG9uZSBjYXNlIHRvIGFub3RoZXIuCiAgCi0gVHlwZXMgb2YgdmFyaWFibGVzCiAgLSBxdWFudGl0YXRpdmUgOiBhIG51bWJlcgogIC0gY2F0ZWdvcmljYWwgKFIgY2FsbHMgdGhlc2UgZmFjdG9ycyk6IHRlbGxzIHdoaWNoIGNhdGVnb3J5IG9yIGdyb3VwIGEgY2FzZSBmYWxscyBpbnRvCiAgLSBhbGwgbm9uLW51bWVyaWNhbCB2YWx1ZXMgYXJlIGNhdGVnb3JpY2FsLCBidXQgbm90IGFsbCBudW1lcmljYWwgdmFsdWVzIGFyZSBxdWFudGl0YXRpdmUKICAgIC0gZS5nLiB6aXAgY29kZSwgSVAgYWRkcmVzcywgZGF0ZXMgCiAgICAKKipDYXNlcyoqCgotIFVuaXQgb2Ygb2JzZXJ2YXRpb24gb3IgYW5hbHlzaXMgCiAgLSB0aGlzIGlzIGV4dHJlbWx5IGNvbnRleHQgc3BlY2lmaWMgCgoKIyMgV2hhdCBpcyBUaWR5IERhdGEgCgotIEJlaW5nIG5lYXQgaXMgKipub3QqKiB3aGF0IG1ha2VzIGRhdGEgdGlkeSEKCgpUaGVyZSBhcmUgdGhyZWUgaW50ZXJyZWxhdGVkIHJ1bGVzIHdoaWNoIG1ha2UgYSBkYXRhc2V0IHRpZHk6CgoxLiBFYWNoIHZhcmlhYmxlIG11c3QgaGF2ZSBpdHMgb3duIGNvbHVtbi4KMi4gRWFjaCBvYnNlcnZhdGlvbi9jYXNlIG11c3QgaGF2ZSBpdHMgb3duIHJvdy4KMy4gRWFjaCB2YWx1ZSBtdXN0IGhhdmUgaXRzIG93biBjZWxsLgoKSXQgaXMgeW91ciBqb2IgYXMgdGhlIHJlc2VhcmNoZXIgdG8gZGVmaW5lIHRoZSB2YXJpYWJsZXMsIG9ic2VydmF0aW9ucywgYW5kIHZhbHVlcy4gCgotIFRoZSAidGlkeW5lc3MiIG9mIHRoZSBkYXRhIHNldCBkZXBlbmRzIG9uIHRoZSByZXNlYXJjaCBxdWVzdGlvbi4gSXQgaXMgbm90IGFuIGluaGVyZW50IHByb3BlcnR5IHRvIHRoZSBkYXRhIHNldCBpdHNlbGYuIAotIFdoZW4gZGF0YSBhcmUgaW4gdGlkeSBmb3JtLCBpdOKAmXMgb2Z0ZW4gc3RyYWlnaHRmb3J3YXJkIHRvIHRyYW5zZm9ybSB0aGUgZGF0YSBpbnRvIGFycmFuZ2VtZW50cyB0aGF0IGFyZSB1c2VmdWwgZm9yIGFuc3dlcmluZyBpbnRlcmVzdGluZyBxdWVzdGlvbnMuCgoKRXhhbXBsZSBvZiBVbnRpZHkgZGF0YSAKCiFbXShpbWFnZXMvdW50aWR5LWVnLnBuZykKCkV4YW1wbGUgb2YgVGlkeSBEYXRhCgohW10oaW1hZ2VzL3RpZHktZWcucG5nKQoKCi0gRGlzYWR2YW50YWdlcwogIC0gdGlkeSBkYXRhIGNhbiBiZSBoYXJkIGZvciBodW1hbiB0byBxdWlja2x5IGludGVycHJldCAKICAtIG9mdGVuIG5vdCB0aGUgaWRlYWwgZm9ybSBmb3IgY3JlYXRpbmcgZ3JhcGhpY3MKLSBBZHZhbnRhZ2VzIAogIC0gY2xlYXIgZGVmaW5pdGlvbnMKICAtIHRpZHkgZGF0YSBjYW4gZWFzaWx5IGJlICp3cmFuZ2xlZCogdG8gYSB1c2VmdWwgZm9ybSBmb3IgaW50ZXJwcmV0YXRpb24gYW5kIHZpc3VhbGl6YXRpb24gCgoKCiMjIFRpZHkgRGF0YSBFeGFtcGxlCgpGcm9tIGh0dHBzOi8vcjRkcy5oYWQuY28ubnovdGlkeS1kYXRhLmh0bWwgCgoKWW91IGNhbiByZXByZXNlbnQgdGhlIHNhbWUgdW5kZXJseWluZyBkYXRhIGluIG11bHRpcGxlIHdheXMuIFRoZSBleGFtcGxlIGJlbG93IHNob3dzIHRoZSBzYW1lIGRhdGEgb3JnYW5pc2VkIGluIGZvdXIgZGlmZmVyZW50IHdheXMuIEVhY2ggZGF0YXNldCBzaG93cyB0aGUgc2FtZSB2YWx1ZXMgb2YgZm91ciB2YXJpYWJsZXMgY291bnRyeSwgeWVhciwgcG9wdWxhdGlvbiwgYW5kIGNhc2VzLCBidXQgZWFjaCBkYXRhc2V0IG9yZ2FuaXNlcyB0aGUgdmFsdWVzIGluIGEgZGlmZmVyZW50IHdheS4KCldoaWNoIG9uZXMgb2YgdGhlc2UgaXMgdGlkeT8gCgojIyMjIE9wdGlvbiAxCgpgYGB7cn0KbGlicmFyeSh0aWR5dmVyc2UpCnRhYmxlMQpgYGAKCgojIyMjIE9wdGlvbiAyCgpgYGB7cn0KdGFibGUyCmBgYAoKCiMjIyMgT3B0aW9uIDMKCmBgYHtyfQp0YWJsZTMKYGBgCgoKIyMjIyBPcHRpb24gNAoKYGBge3J9CnRhYmxlNGEKdGFibGU0YgpgYGAKCgojIyBFeGFtcGxlIENvbnRpbnV0ZWQgCgpUYWJsZSAxIQoKIVtdKGltYWdlcy9SNERTLXRpZHkucG5nKQoKLSBOb3RlIHRoYXQgYWxsIHRhYmxlcyBjb250YWluIHRoZSBzYW1lIGluZm9ybWF0aW9uLCBqdXN0IHJlcHJlc2VudGVkIGRpZmZlcmVudGx5LiBUaHVzLCB3ZSBjYW4gdHJhbnNmb3JtIFRhYmxlcyAyLCAzLCA0YS80YiBpbnRvIFRhYmxlIDEsIGFuZCB2aWNlIHZlcnNhLgoKCiMjIEdhbHRvbiBEYXRhCgpJbiB0aGUgMTg4MHMsIEZyYW5jaXMgR2FsdG9uIHN0YXJ0ZWQgdG8gbWFrZSBhIG1hdGhlbWF0aWNhbCB0aGVvcnkgb2YgZXZvbHV0aW9uLiAgCgpIZXJlJ3MgcGFydCBvZiBhIHBhZ2UgZnJvbSBoaXMgbGFiIG5vdGVib29rLiAgRGlzY3VzcyB0aGUgZm9sbG93aW5nIGluIGdyb3VwczoKCi0gV2hhdCBtaWdodCBoZSBpbnZlc3RpZ2F0ZSB3aXRoIHRoZXNlIGRhdGEgKGUuZy4sICoqUmVzZWFyY2ggUXVlc3Rpb24qKik/Ci0gQXJlIHRoZXNlIGRhdGEgKip0aWR5KiogYWNjb3JkaW5nIHRvIG91ciBkZWZpbml0aW9uPwotIFdoYXQgYXJlIHRoZSAqKmNhc2VzKio/Ci0gV2hhdCBhcmUgdGhlICoqdmFyaWFibGVzKio/Ci0gSG93IG1hbnkgKipyb3dzKiogb2YgZGF0YSBzaG91bGQgdGhlIHJlc3VsdCBoYXZlPwotIEhvdyBtYW55ICoqY29sdW1ucyoqIG9mIGRhdGEgc2hvdWxkIHRoZSByZXN1bHQgaGF2ZT8gIFdoYXQgaXMgdGhlIGRhdGEgdHlwZSBvZiBlYWNoIGNvbHVtbj8KLSBXaGF0IGFyZSBzb21lIGFkZGl0aW9uYWwgdmFyaWFibGVzIChub3QgeWV0IHNob3duKSB0aGF0IG1pZ2h0IGJlIG9mIGludGVyZXN0PyAgSG93IHdvdWxkIHlvdSByZWNvbW1lbmQgc2hvd2luZyB0aGF0IGluZm9ybWF0aW9uIGluIHRoZSBkYXRhIHRhYmxlPwoKCiFbQSBwYWdlIGZyb20gRnJhbmNpcyBHYWx0b24ncyBub3RlYm9vay5dKGltYWdlcy9nYWx0b24tbm90ZWJvb2suanBnKQoKCgojIyBBY3Rpdml0eSAwMTogVGlkeSBEYXRhIAoKV29yayB0byBwdXQgdGhlc2UgdGFibGVzIGluIHRpZHkgZm9ybQoKLSBXb3JrIHdpdGggeW91ciBwYXJ0bmVyIAotIEFzIGEgdGVhbSwgeW91IHdpbGwgcHV0IHR3byBkaWZmZXJlbnQgZGF0YSBzZXRzIGludG8gInRpZHkiIGZvcm0uICAKLSAqKlNlZSBDYW52YXMgZm9yIGRldGFpbHMqKgogICAgLSBWaWV3LW9ubHkgc291cmNlIGRhdGEgaXMgcHJvdmlkZWQKICAgIC0gdXNlIGFueSBzb2Z0d2FyZSB5b3UgbGlrZQogICAgLSBtdXN0IHN1Ym1pdCBhIENTViB0byBDYW52YXMgCiAgICAtIGRvIG5vdCB1c2Ugc3BhY2VzIGluIHlvdXIgZmlsZSBuYW1lcyAKLSBUaXA6ICoqU2tldGNoIHRoaW5ncyBvdXQgdG9nZXRoZXIgb24gcGFwZXIgYmVmb3JlIHlvdSBkbyBhbnl0aGluZyBpbiB0aGUgY29tcHV0ZXIqKgoKCiMjIyMgVGFibGUgMTogKipHYWx0b24ncyBIZWlnaHQgbWVhc3VyZW1lbnRzIGRhdGEqKgoKIVtBIHBhZ2UgZnJvbSBGcmFuY2lzIEdhbHRvbidzIG5vdGVib29rLl0oaW1hZ2VzL2dhbHRvbi1ub3RlYm9vay5qcGcpCgoKIyMjIyBUYWJsZSAyOiAqKlByZXNpZGVudHMqKgoKIVtdKGltYWdlcy9wcmVzaWRlbnRzLmpwZykKCgoKIyMgQ29kZSBCb29rcyAKCiMjIyBXaGF0IGlzIGEgY29kZSBib29rPyAKCi0gQSAqKmNvZGVib29rKiogZGVzY3JpYmVzIHRoZSBjb250ZW50cywgc3RydWN0dXJlLCBhbmQgbGF5b3V0IG9mIGEgZGF0YSBjb2xsZWN0aW9uLiAKLSBBIHdlbGwtZG9jdW1lbnRlZCBjb2RlYm9vayBjb250YWlucyBpbmZvcm1hdGlvbiBpbnRlbmRlZCB0byBiZSBjb21wbGV0ZSBhbmQgc2VsZi1leHBsYW5hdG9yeSBmb3IgZWFjaCB2YXJpYWJsZSBpbiBhIGRhdGEgZmlsZQoKLSBodHRwczovL3d3dy5pY3Bzci51bWljaC5lZHUvd2ViL0lDUFNSL2Ntcy8xOTgzIAoKLSBGZWRlcmFsIEVsZWN0aW9ucyBDb21pc3Npb24gCiAgLSBodHRwczovL3d3dy5mZWMuZ292L2RhdGEvYnJvd3NlLWRhdGEvP3RhYj1idWxrLWRhdGEKICAKICAKIyMgUmVmZXJlbmNlcyAKCi0gaHR0cHM6Ly9kdGthcGxhbi5naXRodWIuaW8vRGF0YUNvbXB1dGluZ0Vib29rL2NoYXAtdGlkeS1kYXRhLmh0bWwjY2hhcDp0aWR5LWRhdGEKLSBodHRwczovL3I0ZHMuaGFkLmNvLm56L3RpZHktZGF0YS5odG1sCi0gaHR0cHM6Ly93d3cuaWNwc3IudW1pY2guZWR1L3dlYi9JQ1BTUi9jbXMvMTk4MwoKCgoK